Web Data Extraction: Challenges and Applications

Dr. Mohammed Sayed Kayed

Abstract

Deep Web contains magnitudes more and valuable information than the surface Web. For the machine to understand these information, it requires substantial efforts since the pages are generated for visualization not for data exchange. So, extracting information from Web pages of searchable Websites has been a key step for Web information integration. Thus, generating a data extractor (wrapper) for a given search form is of great necessity. In this talk, I’ll try to introduce the data extraction problem, discuss the core tasks and challenges and share our research contributions in the field of Web data extraction. Furthermore, I’ll present one of the different interesting applications of Web data extraction which is called “Gadget Creation on Web Portals”. In this application, extracted data can be immediately reused on personal portals by existing presentation components, like map, calendar, table and lists, etc. The underlying technique of gadget creation is an unsupervised web data extraction approach, FivaTech, which has been proposed to wrap data (usually in xml format).

Bio

Dr. Mohammed Kayed received PhD degree from Beni-Suef University, Egypt, in 2007. He is an associate professor at Faculty of Computers and Information, Beni-Suef University, Egypt. His research interests include Web content mining, Opinion Mining, Informarion Extraction and information retrieval. He was a member of the Database Lab in the Department of Computer Science and Information Engineering at the National Central University, Taiwan. Currently, he is a head of CS department at Faculty of Computers and Information, Beni-Suef University, Egypt.